Indonesian Journal of Electrical Engineering and Computer Science 
Vol. 29, No. 2, February 2023, pp. 1017~1029 
ISSN: 2502-4752, DOI: 10.1159 1/ijeecs.v29.i2.pp1017-1029 0 1017 


Human ear print recognition based on fusion of difference 
theoretic texture and gradient direction pattern features 


Kawther Thabt Saleh, Raniah Ali Mustafa, Haitham Salman Chyad 
Department of Computer Science, College of Education, Mustansiriyah University, Baghdad, Iraq 


Article Info 


ABSTRACT 


Article history: 


Received Jun 18, 2022 
Revised Sep 24, 2022 
Accepted Oct 14, 2022 


Keywords: 


Difference theoretic texture 
features 

Fusion feature vector 
Gaussian distribution 
Gradient direction pattern 
Human ear recognition 


Human ear recognition can be defined as a branch of biometrics that uses 
images of the ears to identify people. This paper provides a new ear print 
recognition approach depending on the combination of gradient direction 
pattern (GDP2) and difference theoretic texture features (DTTF) features. The 
region of interest (ROI), the gray scale of the ear print was cut off, noise 
removal by the median filter, histogram equalization, and local normalization 
(LN) are the first steps in this approach. After the image has been processed, 
it is used as input for the fusion of GDP2 and DTTF for extracting the features 
of ear print images. Lastly, the Gaussian distribution (GD) was utilized to 
compute the distance among fusion feature vectors (FV) for ear print images 
for recognizing ear print images for people using a set of images that had been 
trained and tested. The unconstrained ear recognition challenge (UERC) 
database, which comprises 330 subjects for ear print images, provides the 
approach that was suggested by employing ear print databases. Furthermore, 
experimental results on images from a benchmark dataset reveal that 


statistical-rely super-resolution methods outperform other algorithms in ear 
recognition accuracy, which was around 93.70% in this case. 
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1. INTRODUCTION 

The biometrics scheme is very secure compared to the traditional methods such as passwords and 
PINs, due they can be readily forgotten or tampered with. The three essential components of biometric schemes 
are input data, processing unit (authentication, identification, and verification), and output data. Input data are 
obtained from sensors. The behavioral parameters such as (fingerprint, earprint, speech, iris, and face) [1]. The 
authentication person has become a fundamental task for providing security for accessing restricted schemes 
and resources. The biometric-based authentication schemes have been utilized for this purpose [2], [3]. In the 
past few impressive years, human ear recognition (HER) had become very attractive in biometric 
authentications. The significant reasons behind human ear biometrics over another biometric modality are 
smaller in size; much-stabilized shape had been proven through clinical monitoring [4], [5]. 

The human ear is an interesting anatomic component of a passive; physiological biometrics scheme 
that relies on digital camera images. The human ear has several unique features which allow finding specific 
individuals. It can be utilized as efficient biometrics schemes, for instance, in crowd monitoring and terrorist 
identification in public places like airports and in controlling access to governmental offices [6]. Several 
biometric recognition approaches based upon gradient direction pattern (GDP2) and difference theoretic 
texture features (DTTF) have lately been investigated. Such as Mangayarkarsi et al. [7] suggested a detection 
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and recognition system for the human ear to utilize biometrics. In the first phase, the ear is detected and 
segmented from input image utilizing the contour detection algorithm (CDA). Scale invariant feature transform 
(SIFT) was utilized for segmented ear image in the second phase for extracting SIFT features. The model file 
is created from the training images utilizing the extracted features. Test images perform the same process, and 
finally phase utilized the Euclidean distance metric (ED), which computes the percentage of difference among 
the images, was utilized for ear recognition. 

The value of this distance measure has thus been utilized to identify a certain person's ear. From low- 
resolution ear images, Zarachoff et al. [8] described a method for ear recognition depending on principal 
component analysis (PCA) and super-resolution algorithms. In this study, ear images were divided into an 
image query and a database, filtered and down-sampled, resulting in low-resolution ear images. The low- 
resolution images are subsequently extended to their original sizes with the use of a range of nearest neighbor 
classifier (NN) and statistical-based super-resolution approaches. The images are, after that, subjected to PCA 
to obtain their eigen values, utilized as matching features. According to experimental results on images from 
benchmark dataset, statistical-based super-resolution approaches, notably wavelet-based methods, outperform 
other algorithms regarding ear recognition accuracy. 

Emersi¢ et al. [9] present a study of a convolutional neural network (CNN) training problem based on 
(closed set) ear recognition models utilizing limited training data. In this work, various model training strategies 
were investigated and generated an outperformed model on better performing state-of-art methodology (which 
has been based upon histogram of oriented gradients (HOG) descriptors) compared to about 30% in terms of 
first-rank recognition rate (RR). The better model we have been able to generate has been based upon the 
SqueezeNet model, which has been fine-tuned with a limited set of 1,383 ear images of 166 classes which have 
been increased by a factor of 100 after learning parameters with ImageNet data. 

Sarangi et al. [10] propose an excellent approach for representing ear images through the combination 
of the 2 most successful local feature descriptors, local directional patterns (LDP) and pyramid histogram of 
oriented gradients (PHOG). Use the PHOG to express spatial information of the shape and LDP to effectively 
encode local information of the texture. Because feature sets have a lot of high dimensionalities, the PCA has 
been utilized so as to lower dimensions before normalization and fusion in this study. Two sets of normalized 
heterogeneous features were combined after that in order to generate a single feature vector (FV). Lastly, using 
kernel discriminant analysis (KDA) approach, extract the relevant characteristics and effectively recognize 
them using the nearest neighbor classifier (NN). Experimentations on three standard datasets (Univ. of Notre 
Dame collection E, IIT Delhi versions (1 and 2) show that the suggested approach can provide adequate 
recognition performance when compared with existing successful techniques. 

To enhance ear images, Sarangi et al. [11] proposed an automatic enhancement approach that has 
been based on metaheuristic optimization. The proposed algorithm for improving ear images in a few iterations 
by incorporating a mutation operator to a simple and new yet meta-heuristic optimization technicality is termed 
the enhanced Jaya algorithm. After that, using speeded-up robust features (SURF), a pose-invariant local 
feature extractor, extract local features. Lastly, the accuracy identification rate was calculated using the k- 
nearest neighbor (k-NN) classifier. Extensive tests are carried out on four standard datasets, with quantitative 
and qualitative metrics used to assess performance. Experimental results have clearly shown that the suggested 
enhancement method is competitive when compared with two conventional approaches contrast limited 
adaptive histogram equalization (CLAHE) and histogram equalization (HE), as well as two meta-heuristic 
algorithms, differential evolution (DE) and particle swarm optimization (PSO), all of which are based on an 
image-enhancing technicality. For the authentication technique, Thivakaran et al. [12] have presented a 
multimodel of fingerprint and ear biometrics. The feature extraction (FE) technique uses hybrid technicalities 
depending on fingerprint feature extraction (FE), like Minutiae and Singular point, whereas ear feature 
extraction uses SURF and binary robust invariant scalable keypoints (BRISK) technicalities. The feature-level 
fusion process is then used to obtain more reliable information concerning features. 

Lastly, a matching operation has been carried out with the use of the registration and an affine 
transform with a similarity score. Ear biometrics appears to be an excellent solution for strengthening security 
requirements in numerous industries because the ears are noticeable and could be Taken simply, even without 
any awareness of an individual being checked. Results of the experiments have been based upon the IITDelhi 
database for ear images and the CASIA database for fingerprint images. The recommended approach could 
achieve 95.96% accuracy with low error rates of 0.19% false rejection rate (FRR) and 0.11% false alarm rate 
(FAR), compared to 0.17% FAR and 0.37% FRR for the present approach in-ear alone. Using different 
optimization technologies, the precision of the provided approach with more biometric images like palm print, 
face, and iris could be improved in future work. 

Tariq et al. [13] proposed a unique approach for human ear recognition. There are three stages to the 
system. Preprocessing is the initial step (contrast enhancement (CE) and size normalization for ear image). The 
Haar wavelets are employed in the second step to extract features. Recognition utilizing quickly normalized 
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cross-correlation is the final phase. The method is being used on the IIT Delhi and University of Science and 
Technology Beijing (USTB) ear imaging databases. According to experimental results, the proposed method 
appears to attain an average precision of 97.2% and 95.2% on such databases. 

Tian and Mu [14] suggested an ear recognition based upon the deep convolutional network provides 
an image for human ear recognition and suggests a deep CNN (DCNN). A total of three convolutional layers, 
a fully-connected layer, and a softmax classifier make up a CNN. The system employs the USTB ear database, 
suggesting that our proposed algorithm is simpler, more accurate, and superior to the standard algorithm in 
coping with partial occlusion. Ying ef al. [15] suggested a DCNN-based solution to improve human ear 
recognition. A deep network structure-based CNN is used to challenge human ear recognition. The optimum 
activation function has been found to prevent network overfitting, and Dropout technology has been 
implemented in the final fully connected layer. The network model has been trained using a large number of 
samples of the human ear images in order to determine the learning rate, and a number of the feature graphs, 
in addition to other network characteristics. Furthermore, the human ear recognition test has been based upon 
a trained network model. The algorithm appears resistant to occlusion, lighting, and rotation in the comparison 
experiment, and the rate of human ear recognition has been much enhanced. 

Zarachoff et al. [16] presented a two-dimensional wavelet relying on multi-band PCA (ADWMBPCA) 
technique, inspired through PCA based on technical for hyperspectral and multispectral images, which showed 
significantly higher performance than standard PCA. The suggested technique performs a non-destructive two- 
dimensional wavelet transformation on the input image to split the image into its sub-bands. After that, based 
on the coefficient values, it divides each resulting sub-band into several bands. The eigenvectors for the sub- 
bands are then extracted using standard PCA on each of the resulting bands, which are used after that to match 
the features. The suggested 2D WMBPCA outperforms traditional PCA as well as the eigenfaces techniques 
in images from two benchmark ear image datasets, according to experimental results. Jiddah and Yurtkan [17] 
contributed to the area of human ear recognition by including the texture and the geometrical features. This 
study employs the AMI ear database to extract the local binary pattern (LBP) features and run a laplacian filter 
(LF) on raw images to extract the geometrical characteristics. In order to discover the region of high importance 
in the images of the human ear, the ear database has been processed by dividing the ear image to 4 quarters 
and experimenting on every one of them separately. After that, the geometric and texture features are fused, 
and studies are conducted to verify the contribution of the fused features. 

Al Rahhal et al. [18] presented a new ear recognition descriptor. Dense local phase quantization 
(DLPQ) is a suggested descriptor that has been based upon phase responses generated using a well-known LPQ 
descriptor. Local dense histograms have been generated from phase maps' horizontal stripes, succeeded by 
pooling procedure to account for changes in the viewpoint, and finally, an ear descriptor is concatenated. 
Although suggested DLPQ descriptor has been based upon the classic LPQ, we show that it achieves significant 
enhancements (over 20%) compared to the latter descriptor on two benchmark datasets. Othman et al. [19] 
offer a fully automated ear-based biometric system that does not require human interaction and can be utilized 
in real-time. 

The suggested approach is designed to distinguish persons depending on their ear shape, which is 
retrieved from a profile facial image frequently partially obscured by earrings and/or hair. A cascaded 
classifier-based ear recognition method is used first to recognize ears in profile image depending on Haar-like 
characteristics. After that, a new ear detection method is used depending on the shape context descriptor. The 
results of testing the suggested method on a few standard datasets indicate encouraging results: 100% 
recognition was achieved for non-occluded images, whereas 57% accuracy was reached for images where the 
ear has been occluded by both earring and hair. 

Khaldi and Benzaoui [20] suggest employing a tight region of interest (ROI) segmentation of an ear 
to avoid this and ensure that a classifier relies solely on ear pixels. This work used the Image-to-Image 
translation to generate ear ROI segmentation and remove the irrelevant pixels from the input images. 
Additionally, missing ear components owing to distortion or occlusion can be synthesized. To do this, we 
employed the Pix-2-Pix generative adversarial network (GAN) that has been trained on annotated web ears 
(AWE) dataset, which is a difficult ear dataset. The use of ear ROI segmentation improves the process of the 
classification and dramatically boosts the rate of recognition, according to the results of the experiments. 


2. PROPOSED METHOD 
2.1. Difference theoretic texture features (DTTF) 

Texture identification and classification under diverse rotation, lighting and scale conditions is a 
difficult problem in pattern recognition, and grey level diversity statistics were widely used to solve it. DTTF 
presents a new set of rotation, lighting and scale-invariant texture classification features derived from correlated 
distributions of global and local grey level differences in texture image intensities. The authors examine the 
terms in the correlation formula to construct a difference-based feature set invariant and distinctive for a texture 
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class. The authors' studies use the nearest neighbor classifier, and the findings show that the proposed feature 
vector has great classification accuracy under diverse rotation, lighting, and scale conditions [21]; furthermore, 
because they contribute information regarding the same repeating pattern, a DTTF used in another system that 
color textures have a distinctive relation between their color levels. Therefore, the average information, or 
entropy, is considered redundant across color levels. 

Additionally, which is focused on lowering the dimensionality of the features of the color texture 
while keeping high color texture identification accuracy through averaging the entropies across multi- 
dimensional color levels. Susan and Hanmandlu [21] used the mean operation for summarizing the features of 
the original 1 1-dimensional variance theoretic texture for the classification. In this work, they used the entropy 
of the characteristics spanning multiple color levels rather than the mean. The not-comprehensive entropy with 
Gaussian information gain has been utilized as a measure of the entropy in their investigations because it's 
nonlinear and a good predictor of the regular patterns in the textures. Concerning feature dimension reduction, 
comparisons to state-of-the-art reveal that their method is efficient and accurate [22]. It has also been used to 
recognize dynamic faces in videos using 3D-difference theoretic texture features (3D DTTF). 

Along with current vertical, horizontal, and diagonal directions in the 2D DTTF, the 3 D-DTTF 
expands grey-level variation statistics along front (F), front-diagonal vertical (FDV), and front diagonal 
horizontal (FDH) directions. The new 3D characteristics are affine invariant, meaning they are identical to their 
2D counterparts, which is useful for detecting faces in videos despite changes in facial expressions. On Cohn- 
Kanade facial expression video dataset, the recommended dynamic face recognition algorithm outperforms 
existing techniques [23]. 


2.2. Gradient direction pattern (GDP2) 

To extract features (EF) from the grayscale image, only the eight pixels around each pixel were used. 
To get an eight-mask value, multiply the 3x3 pixels region by the Kirsch edge response mask in eight 
directions. Following the application of direction masks, the pixel's original color value has been substituted 
with the value of the corresponding mask. Figure 1 shows the obtaining GDP code from a 3x3 Region, where 
the corresponding mask value in north-west (NW) and south-east (SE) direction mask, corresponding mask 
value in N and S direction mask, corresponding mask value in north-east (NE) and south-west (SW) direction 
mask, corresponding mask value in W and E direction mask and the GDP is applied to that region to produce 
the GDP code illustrate in Figure 1(a), Figure 1(b), Figure 1(c), and Figure 1(d) respectively, which has higher 
stability in the noisy environments compared to the general GDP code. Furthermore, the GDP code is a four- 
bit binary pattern that could yield up to 16 different combinations [24]. 


(283+317)<0 
So, D(2)=1 So D(1)=1 
(c) (d) 


Figure 1. Obtaining GDP code from a 3x3 region corresponding mask value in (a) NW and SE direction 
mask, (b) N and S direction mask, (c) NE and SW direction mask, and (d) W and E direction mask (GDP 
code =001 1=3(dec)) 
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The grayscale image was separated into 81 equal-sized blocks, and the feature vector has been created 
by concatenating the histograms of GDP codes from every block. Figure 2 shows the obtaining mask value for 
a 3x3 region where the local region from the gray scale image, mask value for the pixel ‘20’ using NW 
directional mask, mask value for the pixel ‘52’ using N directional mask, corresponding mask value in eight 
directions and new representation of (a) using mask value are illustrated in Figure 2(a), Figure 2(b), 
Figure 2(c), Figure 2(d) and Figure 2(e) respectively. Only single transition patterns were considered while 
calculating GDP. As a result, every block's histogram length was 8, and the FV length for the entire image was 
81x8= 648 [25]. 


Mask value 


Figure 2. Obtaining mask value for a 3x3 region (a) local region from the gray scale image, (b) mask value 
for the pixel ‘20’ using NW directional mask, (c) mask value for the pixel ‘52’ using N directional mask, 
(d) corresponding mask value in eight directions, and (e) new representation of (a) using mask value 


3. PROPOSED SCHEME 

The suggested method is divided into three phases: testing, training, and recognition. The phases of 
training are divided into 2 parts: the first one is the preprocessing, which includes several sub-phases (ROI, 
grayscale, remove noise, histogram equalization, and local normalization (LN)). The second phase involves 
extracting features for ear print images by combining DTTF and GDP2 and storing features for each class for 
each ear print image sample for an individual in the training database (TRDBF). The phase of the testing is 
involved as well in the scheme's critical phases (feature extraction phase and preprocessing phase). Lastly, 
Gaussian distribution (GD) is used in the recognition phase. Figure 3 depicts the suggested method's 
framework. 
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Ear Print Images 


Ear Print Images Acquisition 


Dataset 


Feature Extraction using fusion of Difference Theoretic Texture Features 
(DTTF) and Gradient Direction Pattern (GDP2) 


Traming Database Features (TRDBF) 


Recognition Pahse (using Gaussian distribution (GD)) 


woos e ene -----7} 


Preprocessing Phase 


Result of Decision 


Figure 3. The general model of the proposed scheme 


4. RESULTS AND DISCUSSIONS 

The results of recognition of ear print system, showing the results for the three phases: input images 
phase, preprocessing phase (convert to gray scale sub-phase, region of interest (ROI) sub-phase, histogram 
equalization (HE) sub-phase, median filter sub-phase, and local normalization (LN) sub-phase) and feature 
extraction phase in addition to a full explanation in this section. 


4.1. Input images phase 

This phase of proposed technique involves loading an image into the proposed scheme of recognition, 
after which it's made available for subsequent phases. The suggested approach can read an image with each 
extension (i.e., image format), and it used a BMP image format for ear print and color images. Figure 4 depicts 
the array. 


Preprocessing on x and y 


SkinHS 


Figure 4. Example of the 3D array contents 
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4.2. Pre-processing phase 
In the preprocessing, the suggested scheme consists of five sub-phases as: 

- Convert to gray scale sub-phase: in this sub-phase, after the input ear print image will convert the color 
ear print image to a grayscale ear print image by utilizing the appropriate formula. Figure 5 shows the 
conversion to grayscale. The original image has been illustrated in Figure 5(a) and the gray image has 
been illustrated in Figure 5(b). The (1) applies for conversion to grayscale. 


Gray scale = 0.2989R + 0.5870G + 0.1140B (1) 


(b) 


Figure 5. Show conversion grayscale for (a) original image and (b) gray image 


- Region of interest (ROI) sub-phase: After converting the earprint image to grayscale, will utilize logical 
indexing to extract the ROI. The idea is identical to indirect indexing, except that each index is logical 
value in this case. Assume that there is a matrix Mat of integer numbers generated randomly. Then we 
will use the lower bound 10 to substitute each value less than 10. The instructions idx < —Mat < 10 returns 
an array where each position obeys the true conditions. After, each value is substituted [26]. Figure 6 
shows the logical indexing example. 


1 Mat <- matrix(sample(25), nrow = 5) 
2 idx <- Mat < 10 
3 Mat(idx] <- 10 


Figure 6. Logical indexing example 


Another variation of indexing, logical indexing, is both beneficial and expressing. The matrix 
subscript is represented by a single logical array in logical indexing. MATLAB obtains the matrix elements 
that correspond to the logical array of nonzero values. The result is always presented as a column vector. 
Logical indexing (LD) is utilized to obtain and modify the ROI pixels. One good method to do that in MATLAB 
is to utilize logical indexing (LI). The expression A (B), if B is the logical and same size as A, selects every 
element in A that corresponds to B's real elements. 

If you search the workspace's browser, you can see that the mask is a logical array. An important 
example (which is utilized here) is the mask = roipoly(D) function from MATLAB. Figure 7 depicts apply mask 
for image, where the gray image has been illustrated in Figure 7(a) and the mask image has been illustrated in 
Figure 7(b). 


(a) (b) 
Figure 7. Show two images for (a) gray image and (b) mask image 
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Now, in the section show the results of the multiply the grayscale earprint image with the mask image 
to get the earprint image as a region of interest. Figure 8 depicts apply the region of interest. The gray image, 
mask image and region of interest are illustrated in Figure 8(a), Figure 8(b), and Figure 8(c), respectively. 


(a) (b) (c) 


Figure 8. Show the three images for (a) gray image, (b) mask image, and (c) region of interest 


- | Histogram equalization (HE) sub-phase: in this sub-phase, after obtaining ROI of the earprint image, it 
will utilize HE, which is its method to adjust the contrast of the earprint image. 

Suppose f be an image represented as an array r x c of integer pixel densities that range between 0 and 

L-1, where L represents number of gray level values in the image, typically 256. Suppose p represents 

normalized histogram (NH) of f. Figure 9 shows the HE, where the region of interest and histogram 

equalization are illustrated in Figure 9(a) and Figure 9(b), respectively. The (2) and (3) apply for HE [27], [28]. 


_ No.of pixels with intensity n 


Pn total No.of pixels (2) 
HE images g will be determined through: 
gi, = ((L— 1) Sf ni =, j0 pn) (3) 


(a) (b) 


Figure 9. Shows the two images for (a) region of interest and (b) histogram equalization 


- Median filter sub-phase: In this sub-phase, after applies of the HE for the earprint image, it will utilize a 
median filter that reduces the blurring of edges. It may be better de-noising algorithms if they comprise 
not only noise but spatial characteristics of the image as well [21]. The median filter is a non-linear 
smoothing approach for reducing-edge blur. The substitution makes it of the current point in the image 
with a median of brightness in the neighborhood. Individual noise heights don't effectively affect the 
median brightness in the neighborhood; hence median smoothing efficiently removes impulsive noise 
[29]. According to performance, the median filter is the best filtering strategy that takes the least 
computational time. It smooths pepper and salt noises [30], [31]. In proposed scheme, the median filter is 
applied in the earprint image; note that the earprint image is enhanced better. Figure 10 shows the apply 
the median filter for the earprint image, where histogram equalization has been illustrated in Figure 10(a) 
and median filtered image has been illustrated in Figure 10(b). 
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Figure 10. Shows the two images for (a) histogram equalization and (b) median filtered image 


- Local normalization (LN) sub-phase: in this sub-phase, after obtaining a better earprint image by applying 
the median filter, it will utilize LN, which is an important sub-phase in order to keep each beneficial 
information in a constant form of illumination. As it becomes easier to feature extraction and recognition 
for the earprint image, changing the illumination condition certainly leads to lower detection rates and 
may be removed through illumination normalization, normalization strategies must be well considered in 
the human earprint recognition scheme [32]-[34]. In the proposed scheme, the LN is applied to the 
earprint image; note that the better enhancement of the earprint image is obtained, the more useful it is in 
the next phase of suggested scheme. Figure 11 shows the local normalized earprint image, where the 
median filtered image has been illustrated in Figure 11(a) and the local normalized image has been 
illustrated in Figure 11(b). 


(a) 


Figure 11. Shows the two images for (a) median filtered image and (b) local normalized image 


4.3. Feature extraction phase (FEP) 


After preprocessing, this phase extracts only important information from the ear print image. The 
proposed scheme utilizes the fusion vector of DTTF and GDP2 and stores features for all of the classes for 
each of the samples of ear print images for an individual in training database features (TRDBF). The feature 
extraction of two methods (DTTF and GDP2) for some samples of local normalized images are shown in 


Tables 1 and 2. 


Table 1. Feature extraction of difference theoretic texture features for some sample 


Class. Difference theoretic texture features 

1 0.2676 0.3721 0.3731 0.0399 0.0296 0.0266 0.2494 0.2844 0.2790 0.3220 0.0687 
793292 022682 555285 115464 219572 632141 976977 442872 738134 596441 020767 
08246 91585 04622 89760 83130 10742 05792 457 02138 19288 12597 

2 0.2684 0.0337 0.3727 0.0349 0.0266 0.0246 0.2499 0.2901 0.2804 = =0.3233 =: 0.0637 
191637 788276 081389 350032 004845 439551 696659 223054 390000 812563 403049 
83582 79176 31065 03340 73518 70263 13664 60061 86879 81653 06973 

3 0.2981 0.3438 0.3736 0.0389 0.0307 0.0262 0.2691 0.2591 0.2597 ~—- 0.2821 0.0705 
874725 382841 398859 540700 406340 051426 459363 680026 666080 920544 879827 
60697 79761 74050 65003 30288 06364 94669 18974 69393 96414 34770 

4 0.2645 0.3399 0.3700 0.0407 0.0305 0.0289 0.2496 0.2609 0.2701 0.2936 0.0739 
494626 707203 132381 019872 850258 620814 795196 582682 983432 854523 069365 
96793 65522 35473 86327 75934 53575 33079 12029 36292 61139 43466 

5 0.3004 =0.3438 )~=— 0.3712 ~—s— 0.0364 ~=—s-: 0.0283 —Sss« 0.0270 ~——: 0.2505 —S («0.2600 )~3=—s: 0.2608 )3~=—s 0.2824 ~—s« 0.0676 
051388 136978 730488 278689 216566 957561 389261 463755 066602 076774 683524 
47354 57417 26711 34169 43313 52297 50702 08462 45598 86253 79131 
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Table 2. Feature extraction of gradient direction pattern for some sample 
Class Gradient direction pattern 
1 9831 19069 14390 §=26776 §=12523, 29393, 12236 = 211851 
2 10166 =18582  =13287 =—28082,s: 13208 )=—. 27674-12100 =. 212932 
3 9581 17175 13730 =28218 =14019 §=28576 §=11236 =. 213420 
4 10313. 17994 =13598 =. 28064-13422) 28278 )=—:12232-— 212360 
5 10900 18769 —=:12812_ = 27216 = 13277 «27920 ~—s«11853 213586 


4.4. Recognition phase 

In this phase, after obtaining the fusion of DTTF and GDP2, it will utilize Gaussian distribution, which 
it utilizes for the calculation of distance among features for earprint recognition, where the distance utilizes to 
determine which class it belongs to. The distance by Gaussian distribution is shown in Table 3. 


Table 3. Computation of the Gaussian distribution 


Class Gaussian distribution 
1 Man 73.2577105503673 
Variance 6910.98541538104 
2 Man 120.149241739909 
Variance 160.732157314637 
3 Man 51.6057133992513 
Variance 160.732157314637 
4 Man 73.2577105503673 
Variance 89.41 19000790942 
5 Man 135.065880569067 
Variance 1813.01670162773 


The results of the evaluation of the earprint recognition scheme were evaluated utilizing two measures: 
the recognition rate (RR) and false alarm rate (FAR). The (4) and (5) were used in order to calculate those two 
measurements [35]—[39]. The best recognition rate was achieved, as shown in Table 4. 


Number of false recognition attempts 


FAR = + 100 (4) 


Total number of attempts 


__ Number of correct attempts x 


RR= 


100 (5) 


Total number of attempts 


Table 4. RR that has been obtained for training and testing for all of the the classes 
Recognition Rates 
Criteria of Evaluation 


FAR RR 
Training 0.0 100% 
Testing 14.6% 93.70% 


In the proposed scheme, the 3-fold cross-validation of the dataset has been split into three equal 
portions, two of which have been used for training, and the third one has been used for testing. The comparison 
to another study which utilized the same UERC database earprint but various methods for recognition utilizing 
texture and geometric features. It must be noted that, in this work, better results have been obtained through 
utilizing the fusion of DT TF and GDP2. 

Table 5 lists the dataset, preprocessing, feature extraction, methods/techniques, and RR used in 
previous works. Concerning the comparison, the analysis of the performance is noticed that the recognition ear 
prints most efficient due to the utilizing a set of sub-phases for pre-processing (convert to grayscale, a ROI, 
HE, median filter and LN) and because of several advantages for the fusion of DT TF and GDP2. Additionally, 
this work utilized the dataset of UERC and gave good results with a fusion of DT TF and GDP2. 
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Table 5. Comparison of the previous approaches 


O 1027 


Ref Year Dataset Pre-processing Feature extraction fais Coleen EReRe 
[7] 2019 Ear images created The original image is SIFT algorithm Euclidean distance 
of About 25 people resized to [492x702] pixels measure approach 
have been collected _ to be the same as the size 
irrespective of age. of images in the database, 
Dataset has both left the resized images are 
and right images of converted into grayscale 
human ear; other images, Contour detection 
than this, the on-line algorithm 
dataset of human 
ear is also used. 
[8] 2018  Top-1, Top-2, Top- Original image converted PCA Euclidean distance 
4, Top-5 to gray image; The mean 
Top-5 dataset adjusted image X' is 
created 
[9] 2017 AWED and - CNNs CNNs 
CVLED datasets 
[10] 2018 IT Delhi-I, IIT Gaussian filter operation, PHOG, LDP, and KDA-NN 97.34 
Delhi-I, UND-E CLAHE, Kirsc, ear edge PCA 
image utilizing the Kirsch 
edge detection 
[11] 2019 IIT Delhi (ITD), localize ear shape, SURF feature k-NN classifier 
Univ. of Notre Enhanced Jaya algorithm extraction 
Dame collection E (EJA). technique 
(UNDE), collection 
J2 (UND-J2), AWE, 
low-resolution 
camera 
[12] 2019 I[TDelhi database histogram standardization, BRISK and SURF matching registration 80.21% 
ROI has been discovered techniques and similarity score 
with the use of the general process) 
protest discovery system 
[13] 2011 USTB ear image Cropping image, Haar wavelet fast normalized cross- 97.2% and 
database and IIT normalization of cropped transform (HWT) correlation (NCC) 95.2% 
Delhi ear image ear images, converted into 
grayscale, and contrast 
enhancement of gray-scale 
image 
[14] 2016 USTB ear cropping, resizing, contrast forward deep convolutional 98.27% 
maximization, propagation and neural network 
and image means error back (DCNN) 
subtraction. propagation. 
[15] 2018 DCNN 95% and up 
to 98% 
[16] 2019 ITD II dataset HE, wavelet multiple-image multi-band PCA 
decomposition (WD) generation (MIG), (2DWMBPCA), 
PCA Eigenvector 
[17] 2018 AM ear database laplacian filter (LF) LBP, splitting ear | geometric and texture 80% 
images, histogram _ feature fusion 
extraction 
[18] 2018 IT Delhi-1 and IT local dense dense local phase 
Delhi-2 histograms (LDH) — quantization (DLPQ) 
[19] 2018 WPUT and NCKU converted to gray image, shape context 100% for 
datasets HE, ear detection (ED) is computation non-occluded 
using Haar feature-based images, 57% 
cascade (HFBC) for occluded 
by both hair 
and earring 
[20] 2020 AWE ear dataset ROI segmentation, GAN LBP, LPQ, BSIF LBP, LPQ, and BSIF 
5. CONCLUSION 


In this paper, a method for recognizing the human ear for biometrics has been proposed and 
implemented. Images of the human ear from the UERC database are used. Around 330 images were collected, 
of which 80% have been utilized for training and the rest 20% have been utilized for testing purposes. Initially, 
the ear is segmented from the input image with the use of the logical indexing method. The fusion vector of 
DTTF and GDP? has been applied to the segmented ear image so as to extract the fusion features. The extracted 
features generate the model file from the training images. The test images carry out the same process, and for 
recognition of the ear, the distance by Gaussian distribution was employed to calculate the distance among 
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features for ear print recognition. Thus, the specific person's ear has been identified using the value of this 
distance measure. Further research will be on minimizing the feature values for better recognition. 
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